AITopics | english-centric model

Collaborating Authors

english-centric model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in?

Zhong, Chengzhi, Cheng, Fei, Liu, Qianying, Jiang, Junfeng, Wan, Zhen, Chu, Chenhui, Murawaki, Yugo, Kurohashi, Sadao

arXiv.org Artificial IntelligenceAug-20-2024

In this study, we investigate whether non-English-centric LLMs, despite their strong performance, `think' in their respective dominant language: more precisely, `think' refers to how the representations of intermediate layers, when un-embedded into the vocabulary space, exhibit higher probabilities for certain dominant languages during generation. We term such languages as internal $\textbf{latent languages}$. We examine the latent language of three typical categories of models for Japanese processing: Llama2, an English-centric model; Swallow, an English-centric model with continued pre-training in Japanese; and LLM-jp, a model pre-trained on balanced English and Japanese corpora. Our empirical findings reveal that, unlike Llama2 which relies exclusively on English as the internal latent language, Japanese-specific Swallow and LLM-jp employ both Japanese and English, exhibiting dual internal latent languages. For any given target language, the model preferentially activates the latent language most closely related to it. In addition, we explore how intermediate layers respond to questions involving cultural conflicts between latent internal and target output languages. We further explore how the language identity shifts across layers while keeping consistent semantic meaning reflected in the intermediate layer representations. This study deepens the understanding of non-English-centric large language models, highlighting the intricate dynamics of language representation within their intermediate layers.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2408.10811

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

Add feedback

How Far Can 100 Samples Go? Unlocking Overall Zero-Shot Multilingual Translation via Tiny Multi-Parallel Data

Wu, Di, Tan, Shaomu, Meng, Yan, Stap, David, Monz, Christof

arXiv.org Artificial IntelligenceJan-22-2024

Zero-shot translation is an open problem, aiming to translate between language pairs unseen during training in Multilingual Machine Translation (MMT). A common, albeit resource-consuming, solution is to mine as many translation directions as possible to add to the parallel corpus. In this paper, we show that the zero-shot capability of an English-centric model can be easily enhanced by fine-tuning with a very small amount of multi-parallel data. For example, on the EC30 dataset, we show that up to +21.7 ChrF non-English overall improvements (870 directions) can be achieved by using only 100 multi-parallel samples, meanwhile preserving capability in English-centric directions. We further study the size effect of fine-tuning data and its transfer capabilities. Surprisingly, our empirical analysis shows that comparable overall improvements can be achieved even through fine-tuning in a small, randomly sampled direction set (10\%). Also, the resulting non-English performance is quite close to the upper bound (complete translation). Due to its high efficiency and practicality, we encourage the community 1) to consider the use of the fine-tuning method as a strong baseline for zero-shot translation and 2) to construct more comprehensive and high-quality multi-parallel data to cover real-world demand.

dataset, fine-tuning, translation, (15 more...)

arXiv.org Artificial Intelligence

2401.12413

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > United States (0.04)
North America > Dominican Republic (0.04)
(10 more...)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.88)

Add feedback

IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages

Gala, Jay, Chitale, Pranjal A., AK, Raghavan, Gumma, Varun, Doddapaneni, Sumanth, Kumar, Aswanth, Nawale, Janki, Sujatha, Anupama, Puduppully, Ratish, Raghavan, Vivek, Kumar, Pratyush, Khapra, Mitesh M., Dabre, Raj, Kunchukuttan, Anoop

arXiv.org Artificial IntelligenceDec-20-2023

India has a rich linguistic landscape with languages from 4 major language families spoken by over a billion people. 22 of these languages are listed in the Constitution of India (referred to as scheduled languages) are the focus of this work. Given the linguistic diversity, high-quality and accessible Machine Translation (MT) systems are essential in a country like India. Prior to this work, there was (i) no parallel training data spanning all 22 languages, (ii) no robust benchmarks covering all these languages and containing content relevant to India, and (iii) no existing translation models which support all the 22 scheduled languages of India. In this work, we aim to address this gap by focusing on the missing pieces required for enabling wide, easy, and open access to good machine translation systems for all 22 scheduled Indian languages. We identify four key areas of improvement: curating and creating larger training datasets, creating diverse and high-quality benchmarks, training multilingual models, and releasing models with open access. Our first contribution is the release of the Bharat Parallel Corpus Collection (BPCC), the largest publicly available parallel corpora for Indic languages. BPCC contains a total of 230M bitext pairs, of which a total of 126M were newly added, including 644K manually translated sentence pairs created as part of this work. Our second contribution is the release of the first n-way parallel benchmark covering all 22 Indian languages, featuring diverse domains, Indian-origin content, and source-original test sets. Next, we present IndicTrans2, the first model to support all 22 languages, surpassing existing models on multiple existing and new benchmarks created as a part of this work. Lastly, to promote accessibility and collaboration, we release our models and associated data with permissive licenses at https://github.com/AI4Bharat/IndicTrans2.

bharat parallel corpus collection, european language resource association, indictrans2-m2m model, (12 more...)

arXiv.org Artificial Intelligence

2305.16307

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.27)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.13)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(45 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Law (1.00)
Education (1.00)
Government > Regional Government > Asia Government > India Government (0.66)
Consumer Products & Services > Travel (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability

Ye, Jiacheng, Tao, Xijia, Kong, Lingpeng

arXiv.org Artificial IntelligenceJun-11-2023

Multilingual transfer ability, which reflects how well the models fine-tuned on one source language can be applied to other languages, has been well studied in multilingual pre-trained models (e.g., BLOOM). However, such ability has not been investigated for English-centric models (e.g., LLaMA). To fill this gap, we study the following research questions. First, does multilingual transfer ability exist in English-centric models and how does it compare with multilingual pretrained models? Second, does it only appears when English is the source language for the English-centric model? Third, how does it vary in different tasks? We take multilingual reasoning ability as our focus and conduct extensive experiments across four types of reasoning tasks. We find that the multilingual pretrained model does not always outperform an English-centric model. Furthermore, English appears to be a less suitable source language, and the choice of source language becomes less important when the English-centric model scales up. In addition, different types of tasks exhibit different multilingual transfer abilities. These findings demonstrate that English-centric models not only possess multilingual transfer ability but may even surpass the transferability of multilingual pretrained models if well-trained. By showing the strength and weaknesses, the experiments also provide valuable insights into enhancing multilingual reasoning abilities for the English-centric models.

english-centric model, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2306.06688

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback